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Abstract 

The aim of this paper is to investigate from the numerical point of view 
the possibility of coupling the Hamilton- Jacobi-Bellman (HJB) equation 
and Pontryagin's Minimum Principle (PMP) to solve some control prob- 
lems. A rough approximation of the value function computed by the HJB 
method is used to obtain an initial guess for the PMP method. The 
advantage of our approach over other initialization techniques (such as 
continuation or direct methods) is to provide an initial guess close to the 
global minimum. Numerical tests involving multiple minima, discontinu- 
ous control, singular arcs and state constraints are considered. The CPU 
time for the proposed method is less than four minutes up to dimension 
four, without code parallelization. 

Keywords. Optimal control problems, minimum time problem, Hamilton- 
Jacobi-Bellman equations, Pontryagin's minimum principle, shooting method. 
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1 Introduction 

In optimal control, the so-called direct methods, based on discretization and 
nonlinear programming, are currently the most popular. The development of 
many powerful codes in the recent years, such as NUDOCCCS 7] . MUSCOD 
[2], or IPOPT [3], allowed to solve difficult and complex problems [HH]. On the 
other hand, the indirect methods, based on Pontryagin's Minimum Principle 
(PMP), are both fast and accurate but tend to suffer from a great sensitivity 
to the initialization. The aim of this paper is to show that this difficulty can be 
overcome by coupling the indirect methods with the Hamilton- Jacobi-Bellman 
(HJB) approach. 

The HJB theory and PMP are usually considered two separate worlds al- 
though they deal with the same kind of problems. The theoretical connections 
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between the two approaches are well known [5J [7J [HI [2] , but coupled usage of 
the two techniques is not common and not completely explored. 
In this paper we deal with the following controlled dynamics 

/ il(t) = f(y(t),u(t)), t>0 

\ y(0) = x, XE R d V ' 

where the control variable u(-) E U := {u : M + — > U, u measurable} and 
U C K m (m > 1). We will denote by y x (t;u) the solution of the system (JT]) 
starting from the point x with control u. Let C C M d be a given target. For any 
given control u we denote by tf(x,u) the first time the trajectory y x (t;u) hits 
C (we set tf(x,u) = +oo if the trajectory never hits the target). We also define 
a cost functional J as 

ptf(x,u) 

J(x,u) := / e(y x (t;u),u(t))dt, (2) 

Jo 

where I is a suitable cost function. The final goal is to find 

u* G U such that J(x,u*) — min J(x,u) , (3) 

and compute the associated optimal trajectory y*(t; u*). 
Finally, we define the value function 

T(x) := J{x,u*), xeR d . 

Choosing t = 1 in ([2]) we obtain the classical minimum time problem. 

The PMP approach consists in finding a trajectory which satisfies some 
necessary conditions. This is done in practice by searching a zero of a certain 
shooting function, typically with a (quasi-)Ncwton method. This method is 
well known and it is used in many applications, see [101 111! 112) and references 
therein. The main advantages of this approach lie in its accuracy and in its low 
numerical complexity. It is worth to recall that the dimension of the nonlinear 
system for the shooting method is usually 2d, where d is the state dimension. 
This is in practice quite low for this kind of problem, therefore fast convergence 
is expected in case of success, especially if the initial guess is close to the right 
value. Unfortunately, finding a suitable initial guess can be extremely difficult 
in practice. The algorithm may either not converge at all, or converge to a local 
minimum of the cost functional. 

The HJB approach is based on the Dynamic Programming Principle [13j . It 
consists in characterizing the value function T as the solution of a first-order 
nonlinear partial differential equation. Once an approximation of the value 
function is computed, they are easily obtained both the optimal control u* in 
feedback form and, by a direct integration, the optimal trajectories for any 
starting point x £ K d O \T5\ . The method is greatly advantageous because it 
is able to reach the global minimum of the cost functional, even if the problem 
is not convex. The HJB approach allows also to have a global overview of the 
optimal trajectories and of the reachable sets (or capture basins) i.e. the sets 
of the points from which it is possible to reach the target in any given time. 
Beside all the advantages listed above, the HJB approach suffers from the well 
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known " curse of dimensionality" , so in general it is restricted to problems in 
low dimension (d, < 3). 

In this paper we couple the two methods in such a way we can preserve the 
respective advantages. The idea is to solve the problem via the HJB method on 
a coarse grid, to have in short time a first approximation of the value function 
and the structure of the optimal trajectory. Then, we use this information to 
initialize the PMP method and compute a precise approximation of the global 
minimum. To our knowledge this is the first attempt to exploit the connection 
between HJB and PMP theories from the numerical point of view. 

Compared to the use of continuation techniques or direct methods to obtain 
an estimate of the initial costate, the main advantage of the approach presented 
here is that the HJB method provides an initial guess close to the global min- 
imum. The main limitation is the restriction with respect to the dimension of 
the state. 

We consider some known control problems with different specific difficulties: 
several local minima, several global minima, discontinuous control, presence 
of singular arcs, and state constraints. In all these problems, we show that 
combining PMP method with HJB approach leads to a very efficient algorithm. 

2 Preliminaries 

Consider optimal control problems in the general Bolza form, autonomous case, 
with a fixed or free final time. 



Here U is a compact set of R m and the following classical assumptions are 
satisfied: 

- / : R d x U -> R d and I : R d x U -> K arc continuous, and arc of class C 1 
with respect to the first variable. 

- C is a closed subset of R d for which the property " a vector is normal to C 
at a point of C" makes sense. For instance, C can be described by a finite 
set of equalities {ci(x) = 0}i or inequalities {ci(x) < 0}i, with Ci being of 
class C 1 for every i, and the classical constraint qualification assumptions 
hold. For numerical purposes we assume C is bounded. 

2.1 Pontryagin's Minimum Principle approach 

We give here a brief overview of the so-called indirect methods for optimal 
control problems [HI 23 EH] • We introduce the costate p, of same dimension d 
as the state x, and define the Hamiltonian 



(P)< 



' min J{x,u) = Jl f{x ' u) i(y(t),u{i))dt 
y(t) = f(y(t),u(t)) 
u(t) 6 U for a.e. t e (0,tf(x,u)) 
2/(0) = x 
k y(t f (x,u)) £ C 



Objective 
Dynamics 
Admissible Controls 
Initial Conditions 
Terminal Conditions 



H(y,P, u,Po) = Pd{y, u)+ < p, f(y, u) > . 
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Under the assumptions on / and I introduced above, Pontryagin's Minimum 
Principle states that if (y*, u*, tf) is a solution of (P), then there exists (po,P*) ^ 
absolutely continuous such that 

y*(t)=H p (y x (t),p*(t),u*(t),p ), y*(0) = x, (4a) 

p*{t) = -H y (y x (t),p*(t),u*(t),p ), (4b) 

p*(t* f ) ±T c (y* x (t})), (4c) 

u*(t) = argmin H(y*(t),p*(t),v,p Q ) for a.e. t £ [0,t% (4d) 

where Tc(£) denotes the contingent cone of C at £. Moreover, if the final time 
ty is not fixed and is an optimal time, then we have the additional condition: 

H(y x (t),p*(t),u*(t),p ) = 0, forte(0,t}). (5) 

Two common cases are C — {y/} with p*{t*^) free, and C = W l with p*(t*j) = 0. 

Now we assume that minimizing the Hamiltonian provides the control as a 
function 7 of the state and costate. For a given value of p(0), we can integrate 
(y,p) by using the control u — j(y,p) on [0, tf]. We define the shooting function 
S that maps the unknown p(0) to the value of the final and transversality 
conditions at (y(tf),p(tf)). Finding a zero of S gives a trajectory {y,u) that 
satisfies the necessary conditions for the problem (P). This is typically done in 
practice by applying a (quasi-)Newton method. 

Remark 2.1 The multiplier po could be equal to 0. In that case, the PMP is 
said anormal, its solution (y* ,p* , u*) corresponds to a "singular" extremal which 
does not depend on the cost function I. Several works have been devoted to the 
existence ( or nonexistence ) of such extremal curves \1(K \20\j . For numerics, in 
general we assume that po ^ which leads to solve the PMP system with po = 1 . 
In the sequel, we will always assume that we are in the normal case (po = 1). 



Singular arcs. A singular arc occurs when minimizing the Hamiltonian fails 
to determine the optimal control u* on a whole time interval. The typical 
context is when H is linear with respect to u, with an admissible set of controls 
of the form U = [ui ow , u up \. In this particular case, the function (y,p,u) 1 — > 
H u (y,p,u) does not depend on the control variable. We define the switching 
function tp(y,p) — H u (y,p,u) and have the following bang-bang control law: 

iftp(y,p)>0 then u* = u iow 
if ip(y,p) < then u* = u up 
if ip(y,p) = then switching or singular control. 

A singular arc then corresponds to a time interval where the switching function ip 
is zero. The usual way to obtain the singular control is to differentiate ip with re- 
spect to t until the control explicitly appears, which leads to solving an equation 
of the form -0( 2fc ) (y,p) — 0, see [T7]. This step can be quite difficult in practice, 
depending on the problem. Moreover, it is also required to make assumptions 
about the control structure, more precisely to fix the number of singular arcs. 
Each expected singular arc adds two shooting unknowns (t entryi t ex n), with the 
corresponding junction conditions ip(t en t r y) = ip(t en try) = or alternatively 
) = ip(t exlt ) — 0. The problem studied in section [O] presents such a 
singular arc. 
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State constraints. Wc consider a state variable inequality constraint g(y(t)) < 0. 
We denote by q the smallest order such that depends explicitly on the con- 
trol u; q is called the order of the constraint g. The Hamiltonian is defined with 
an additional term for the constraint 

H(y,p,u) = e(y,u)+ <p,f(y,u) > +^g {q) {y,u) 

with the sign condition 

f ii = if g < 
\ H > if g = 0. 

When the constraint is inactive we are in the same situation as for an uncon- 
strained problem. Over a constrained arc where g(y) = 0, we obtain the control 
from the equation g( q '(y,u) = 0, and /i from the equation H u = 0. As in the 
singular arc case, we need to make assumptions concerning the control struc- 
ture, namely the number of constrained arcs. Each expected constrained arc 
adds two shooting unknowns (t en try,t e xit) with the Hamiltonian continuity as 
corresponding conditions. We also have the so-called tangency condition at the 
entry point 

N(y(t ent ry)) ■= (g(y(t en try)), ■ ■ ■ , (y(t entry))) = 0, 

with the costate discontinuity 

P^tntry) = P^entry) ~ ^N y {y{t entry )) 

where it G WL q is another multiplier yielding an additional shooting unknown. 

Remark 2.2 The tangency condition can also be enforced at the exit time, in 
this case the costate jump occurs at the exit time as well. 

2.2 Hamilton- Jacobi-Bellman approach 

Consider the value function T : R'' — * R, which maps every initial condition x S 
R d to the minimal value of the problem (P). It is well known (see for example 
21J for a comprehensive introduction) that the value function T satisfies a 
Dynamic Programming Principle and the Kruzkov transform of T, defined by 

v(x) :=l-e- r ^ x \ 

is the unique solution (in viscosity sense |21j ) of the following HJB equation 

f v(x) + sup{-f(x,u)-Dv(x)-£(x,u) + (i(x,u)-l)v(x))} = x 6 R d \C 
<^ ueu 

{ v(x) = xeC. 

(6) 

Obtaining a numerical approximation of the function v is a difficult task, mainly 
because v is not always differentiable. Several numerical schemes have been 
studied in the literature. In this paper we will use a first-order semi-Lagrangian 
(SL) scheme [HI [H]. This choice is motivated by the fact that SL scheme 
seems the best one in order to approximate the gradient of the value function, 
this being our goal as we will see in the next section. More precisely, a first- 
order finite difference scheme is less accurate (but faster) because is not able to 
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follow the characteristics along diagonal directions, his stencil being limited to 
the four neighbouring nodes (plus the considered node itself), see for example 
[22] . Ultra-Bee scheme is accurate, but the solution is quite stair-shaped and 
not suitable for the approximation of the partial derivatives, see for example 

[221 

We fix a (numerical) bounded domain Q D C and we discretize it by means of 
a regular grid G = {xi, i = 1, . . . , Nq}, where Nq is the total number of nodes. 
We denote by v{x; h, k, f2) the fully discrete approximation of v, h and k being 
two discretization parameters (the first one can be interpreted as a time step to 
integrate along characteristics and the second one is the usual space step). We 
impose state constraint boundary conditions on dfl. The discrete version of ([6]) 
is 

f v( Xi ) = H\v\( Xi ) x, e (ft\c) n G (7) 
\ v(xi) = i,eCnG { ' 

where 

H\v\(xi) := mm{P 1 (v;x l + hf(x t ,u)) + he(x t ,u)(l -v( Xl ))} (8) 

u£LU 

and Fi(y;xi + hf(xi,u)) denotes the value of v at the point Xi + hf[xi,u) 
obtained by linear interpolation (note that the point xt + hf(xi,u) is not in 
general sitting on the grid). The numerical scheme consists in iterating the 
fixed point sequence 

= H n=l,2,... (9) 

until convergence, starting from v^(xi) = on C and 1 elsewhere. To speed 
up the convergence we use the Fast Sweeping technique [21] . The function v is 
then extended to the whole space by linear interpolation. Once the function v 
is computed, we get easily the corresponding approximation T of T, and then 
the optimal control law in feedback form, see [T31[T5] for details. 

It is useful to note that the equation © can also model a front (interface) 
propagation problem. Following this interpretation, the boundary of the target 
dC is the front at initial time t = 0, and the level set {x : T(x) = t} represents 
the front at any time t > 0. 

3 Coupling HJB and PMP 

3.1 Main connection 

It is known [7] that for a general control problem with free end-point, if the value 
function is differentiable at some point x € K d then it is differentiable along the 
optimal trajectory starting at x. Actually, the gradient of the value function is 
equal to the costate of Pontryagin's principle. In the context of minimum time 
problems (with target constraint), the link between the minimum time function 
and Pontryagin's principle has been also investigated in several papers E] , 
proving the same connection. 

The main idea of the paper is to compute an approximation of the value 
function T solving the HJB equation on a rough grid, then approximate DT(x) 
(x being the starting point) and finally use it as initial guess for p(0). The 
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approximation of the gradient can be obtained by standard first-order centred 
finite differences. 

In the case T ^ C 1 (M. d ), it is proved in [5] that a connection between the 
two approaches still exists. More precisely, under some additional assumptions, 
we have 

p*(t) e D+T{y*(t)) forte[0,T(x)}, 
where D + T(x) is the superdifferential of T at x defined by 

D+T(x) := limsup ?M Z T ^ Zp^ll < 1 . (10) 

I »-* \y-x\ J 

In the rest of this section we assume that D + T(x) ^ 0. It is plain that we can 
not use finite difference approximation in order to compute p(0) at the points 
where the value function T is not differentiable. Rather than that, we follow 
a different strategy. Let S > be is a small positive parameter, and B(0, 1) 
denote the unit ball in R d centred at 0. We first compute the vector 

~, , = T(x + 5C)-T(x) ~ w . th min T (x + SQ. (11) 

C6S(0,1) 

Note that C* is an approximation of the direction of maximal decrease of T, and 
||£*|| is an approximation of the directional derivative of T along the direction 
C*. Since in the case T e C 1 (M d ) the direction £* is a first-order approximation 
of DT(x), we will use £* as initial guess for the costate p(0). 

Let us explain on a simple example why we choose the definition (lll[) . Con- 
sider the case 

d=2, C = {(3,0)}U{(-3,0)}, 1 = 1, f = u, [7 = 5(0,1). 

In this case the HJB equation reduces to the simple eikonal equation. On the 
line {x = 0} the function T is not differentiable, see the level sets in Fig. Q~|-left. 
This line corresponds to a zone where two optimal trajectories are available, 




Figure 1: two crossing fronts with and without superimposition. Arrows corre- 
spond to the (two) vector(s) £* 



i.e. the functional J has two global minima. Following the front propagation 
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interpretation (see end of section |2~2"|) . here we have two fronts which hit each 
other at the line {x = 0}. The viscosity solution of the HJB equation selects 
automatically the first arrival time so we never see the two crossing fronts, but 
we could in principle follow the propagations of the two fronts separately (see 
Fig. [T]- right), and then compute the two gradients of the two value functions. 
These two gradients correspond to two optimal choices for p(0). By means of 
(|lip . we can approximate the two gradients without splitting the evolutions of 
the fronts. We first compute the two directions (*, Q °f maximal decrease of 
the function T (see Fig. EJright), and then the "two gradients" £*, £2 °f T . 

In the present example, focusing on the point (0,0), the two directions of 
maximal decrease are (—1,0) and (1,0). It is easy to show that these two vectors 
coincide with the two "extremal" vectors in D + T(x), namely the vectors 77 
verifying 

,. T{y) - r i x ) -v-(y-x) n / 1Q s 

hmsup j j — 0. (12) 

y^x \y-x\ 

Although this relationship is not true for every function T such that D + T(x) ^ 
0, it is easy to sec that it is true whenever the curve of non-diffcrcntiability is 
due to the collision of two or more fronts (as in Problem 1, Section R~Tj) . 

In section H] we will show that, beside an initial guess for p(0), also other 
useful data can be extrapolated from the value function, and used to start the 
shooting method. 



3.2 Convergence of DT 

Let us denote by D — (D\, . . . , Dd) the discrete gradient computed by centred 
finite differences with step z > 0, 

D t T(x) ^ n* + *ei)^n*-*ei) t i = 1> ... >d 

where {ej },-=!,. „ )( j is the standard basis of R d . 

Many papers (see for example [57] in the context of differential games) 
investigated the convergence of the approximate value function v(- ; h, k, 51) to 
the exact solution v when the parameters h, k tend to zero and O tends to 
R d . These results were quite difficult to be obtained because the function v is 
not in general differentiablc. To our purposes we have to go farther, proving 
the convergence of T(- ;h, k, O) = — ln(l — v(- ;h,k,tt)) to T and then the 
convergence of DT(- ; h, k, fl) to DT, because the latter will be used by the 
PMP method as initial guess. 

Let us assume that k = C\h for some positive constant C±. Given a generic 
estimate of the form 

\\v(-;h,R d )-v(-)\\ L ^ (Rd) <Ch a , C,a>0 (13) 

we have the following 

Theorem 3.1 Assume that T 6 C (O) and there exists T max > such that 
< T{x) < T m ax for all x G fl. 

Let us define 

E{x) := \\DT{x;h,n)-DT(x)\\ 00 
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where \\ is the maximum norm in M. d . 
Then there exists Sl'cO such that 

\\E\\ L °° m = 0(h a /z) + 0(z 2 ) for h,z^0. 

For the SL scheme we use here, an estimate of the form (|13p in the particular 
case I = 1 (under assumptions weaker than those used in Theorem 13. ip can be 
found in [27| . The proof of the theorem is postponed in the Appendix. 

4 Numerical experiments 

We have tested the feasibility and relevance of combining the HJB and PMP 
methods on four optimal control problems. Each of these problems highlights a 
particular difficulty from the control point of view. 

Problem 1 (section 14. ip is a two-dimensional minimum time problem with 
local and global minima. We will see in this example that the shooting method 
is very sensitive with respect to the initial guess (as usual). 

Problem 2 (section l4~2"|) is a two-dimensional controlled Van der Pol oscillator 
with control switchings. 

Problem 3 (section 14. 3p is the well-known Goddard problem with singular 
arcs, in the one-dimensional case (total state dimension is three). 

Problem 4 (section 14. 4p is another minimum time target problem in dimen- 
sion four, with a first-order state constraint. 

Details for HJB implementation. The algorithm is written in C++ and 
ran on a PC with an Intel Core 2 Duo processor at 2.00 GHz and 4GB RAM. 
The code is not parallelized. The reported CPU time is the time required for 
the whole process, which includes the computation of the value function, its 
gradient, the optimal trajectory and save the results on file. 
The grid G has N\ X . . . X Nd nodes. Grid cells have the same size k in any 
dimension. The set of admissible controls U is discretized in Nc equispaced 
discrete controls {uj, j = 1, . .. , Nc}- The (fictitious) time step h is variable, 
and chosen in such a way h\f(xi,Uj)\ = k for any Xi and Uj, so that the stencil 
is limited to the eight neighbouring nodes (plus the considered node itself). The 
stop criterion for the fixed point iterations ((9]) is — t)W ||.t°o(n) < e — 

10" 5 . 

Details for PMP implementation. The algorithm is written in Fortran 90 
and ran on a PC with an Intel Core 2 Duo processor at 2.33 GHz and 2GB RAM. 
We used the ShootQ software which implements a shooting method with the 
Hybrd [5Hj solver. For the four problems studied we set the ODE integration 
method to a basic 4th-order Runge-Kutta with 100 steps. 

4.1 Minimum time target problem 

The first example illustrates how a local solution can affect the shooting method. 
We consider a simple minimum time problem in two dimensions. The goal is to 

ihttp: / /www. cmap.polytechnique.fr/~martinon/ 
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reach a given position on the plane by controlling the angle of the velocity. We 
choose the velocity in such a way the cost functional has multiple minima. 

min tf 

yi(t) = c(yi(t),y 2 (t))cos(u(t)) 
(p\) Vi{t) = c(yi (t) , y 2 (t) ) sm(u(t) ) 
{Pl) ] L7 = [0, 2tt) 

2/(0) =x = (-2.5,0) 
, !/(*/) = (3,0) 

with 

/ \ f 1 if 2/2 < 1 

c(yi ' y2) = \ 0/ 2 -i) 2 + i if W >l. 

Due to the expression of c, we have at least two minima. The simplest one 
corresponds to a straight line trajectory (— ) along the 2/1 axis with 2/2 = 0. The 
other one has a curved trajectory (n) that takes advantage of the larger values 

of J/2- 



4.1.1 PMP and shooting method 

We first try to solve the problem with the PMP and the shooting method. 
Therefore we seek a zero of the shooting function defined by 

/ t f \ fyi(t f )-3\ 

Si: U(0) - V2(tf) . 
\P2(0)J \P3(tf) - 1/ 



Global and local solutions. Depending on the initial guess, the shooting 
method can converge to a local or global solution (Fig. [5]). The most common 
local solution is the straight line trajectory from x to C := {(3,0)}, with a 
constant control u = and a final time Ti oca i — 5.5. The global solution has an 
arch shaped trajectory that benefits from the higher speed for increasing values 
of 2/2, with a final time t% = T g i t a i = 4.868. 



STATE COSTATE CONTROL STATE COSTATE CONTROL 

5 1 8r 5 7.5 




Figure 2: (Pi) - Global solution (curved trajectory) and local solution (straight 
trajectory) found by the shooting method. Control must be intended modulo 
2tt. 
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Sensitivity with respect to the starting point. Even for this simple prob- 
lem, the shooting method is very sensitive to the starting point. Numerical tests 
indicate that it converges in most cases to local solutions. We ran the shooting 
method with a batch of 441 values for p(0), equally distributed in [— 10, 10] 2 , 
and two different starting guesses for the final time (Fig. [3]). We observe that 
for the batch with the tf = 1 initialization, 11% of the shootings converge to 
the global solution, 60% to the straight line local solution, and 24% to another 
local solution with an even worse final time (tf = 6.06). Remaining 5% does not 
converge at all. For the batch with the tf = 10 initialization, 9% of the shoot- 
ings converge to the global solution, 50% and 29% to the two local solutions, 
and 12% does not converge. Obviously, just taking a random starting point is 
not a reliable way to find the global solution. 



CONVERGENCE STUDY FOR P(0) 6 [-10,10] AND T = 1 



GLOBAL SOLUTION (T.4.868):49 [11%] 
LOCAL SOLUTION (T.5.5): 265 [60%] 
LOCAL SOLUTION (T=6.06): 107 [24%] 



P,f0) 



CONVERGENCE STUDY FOR P(0) e [-1 0,1 Of AND T = 10 



GLOBAL SOLUTION (T.4.868): 38 [9%] 
LOCAL SOLUTION (T.5.5): 222 [50%] 
LOCAL SOLUTION (T.6.06): 127 [29%] 



P,(0) 



Figure 3: (Pi) - Convergence from a random initialization. 



4.1.2 Solving the problem with the HJB approach 

In Fig. QJ we show the level sets of the minimum time function T associated 
to the control problem (Pi). The numerical domain is fl = [— 6, 6] 2 . As it 
can be seen, the minimum time function is not diffcrcntiablc everywhere. The 
curve of the discontinuity of the gradient represents here the set of the initial 
points associated to two optimal trajectories. The superdifferential D + T at the 
points of non-differentiability is non empty. Notice that here the minimal time 
function remains differentiable along each trajectory. We will see in section |4~21 
a different type of non-differentiability for the value function. 

4.1.3 Coupling the HJB and PMP approaches 

We now use the data provided by the HJB approach to obtain a starting point 
close to the global solution. The HJB solution provides not only an approxi- 
mation of the costate p(0), but also an estimate of the optimal final time t*j. 
In Table [TJ we summarize the results obtained by solving the HJB equation on 
several grids. As we can see, the outcome is not very sensitive with respect to 
the discretization parameters. This means that the choice of a very rough grid 
can be sufficient to obtain a good initial guess for PMP. In fact, the shooting 
method immediately converges to the global solution when using the starting 
point obtained by the HJB method on the coarsest grid (Tabled]). 
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-6 -4 -2 2 4 6 

SL-FS 



Figure 4: (Pi) - Level sets of the minimum time function T, the optimal tra- 
jectory starting from (—2.5,0) and the two optimal trajectories starting from 
(-1.835,0). 



Ni x N 2 


N c 


t* 


p(0) 


CPU time (sec) 


25 x 25 


16 


4.895 


(-0.049, -1.000) 


0.08 


50 x 50 


16 


4.895 


(-0.048, -1.000) 


0.37 


200 x 200 


32 


4.878 


(-0.051, -1.000) 


20.25 



Table 1: (Pi) - HJB approach: minimal time and initial costate associated to 
the optimal trajectory. 





t* 
l .f 


p(0) 


Initialization by HJB 


4.89 


(-0.05,-1) 


Solution by PMP 


4.868 


(-5.552 x 10" 2 , -9.985 x 10" 1 ) 



Table 2: (Pi) - Initialization by HJB and solution by PMP. 



We can check that the convergence of the shooting method is much easier in 
a neighbourhood of the HJB initialization. We test again a batch of 441 values 
for p(Q), equally distributed in [—0.1,0] x [—2,0], which corresponds to a 100% 
range around the HJB initialization (—0.05, —1). We also set tt = 4.89. This 
time the shooting method finds the global solution for 76% of the points, and 
only 12% and 9% for the local solutions (Fig. EJ. 

We also consider the case of a starting point very close to the curve where the 
minimal time function is not differentiable: x — (—1.835, 0). The HJB equation 
is solved on a 300 x 300 grid, with Nc = 32. The two optimal trajectories are 
shown in Fig. 0] Here the approximation of p(0) (see section l3~Tj) gives the two 
directions p x (0) = (-0.05,-1.00) and p 2 (0) = (-0.99,0.00). Using these two 
values to initialize the shooting method, we obtain the two distinct solutions 
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CONVERGENCE STUDY NEAR HJB INITIALIZATION 
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+ GLOBAL SOLUTION (T=4.868): 337 [76%; 
□ LOCAL SOLUTION (T=5.5) : 53 [1 2%] 
LOCAL SOLUTION (T=6.06): 39 [9%] 




1 -0.08 


-0.06 -0.04 -0.02 
P.(0) 



Figure 5: (Pi) - Convergence to the global solution is much easier near the HJB 
initialization. 



with the "cap" and "straight" trajectories (Table [3]). 





f* 


P(0) 


Initialization by HJB (n) 


4.84 


(-0.05,-1) 


Solution by PMP (n) 


4.8246 


(-7.67 x 10- 2 ,-9.97 x lO" 1 ) 


Initialization by HJB (— ) 


4.84 


(-0.99,0) 


Solution by PMP (-) 


4.835 


(-1,-6.2137 x 10" ie ) 



Table 3: (Pi) - Initialization by HJB and solution by PMP (two global solu- 
tions). 



4.2 Van der Pol oscillator 

The second test problem is a controlled Van der Pol oscillator. Here we want to 
reach the steady state (3/1,2/2) = (0,0) in minimum time. It is well known that 
the optimal trajectories, for this problem, are associated to bang-bang control 
variables. 

min tf 

yx(t) = y2(t) 

( p J h(t) = -yi(t) + y 2 (t)(l- yi (t) 2 ) + u(t) 
( P2 M £7 = [-1,1] 

y(0) =x= (1,-0.8) 

k y(*/) = (o,o) 

4.2.1 PMP and shooting method 

Here, the Hamiltonian is linear with respect to u, therefore we have a bang-bang 
control with the switching function ip(y,p) = H u (y,p,u) = The shooting 
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function is defined by 



S 2 




yi(tf) 



We test the shooting method with the same initial points as for problem (Pi). 
The convergence results are even worse in this case: for the tf — 1 initialization, 
only 9% of the shootings converge to the global solution, and 0.5% for the 
tf = 10 initialization. 

4.2.2 Solving the problem with the HJB approach 

Here we use the HJB approach to compute the minimal time function. In Fig. 
[51 we show the level sets of the solution obtained in O = [— 2,2] 2 . As in the 




Figure 6: (P2) - Level sets of function T and the optimal trajectory starting 
from (1,-0.8). 



previous problem, the value function is not differentiable everywhere, but here 
the curve of non-differentiability has a different nature. It can no more be seen 
as the curve of collision between two fronts and is not caused by the existence 
of multiple optimal solutions. It corresponds to the points where the control 
switches between —1 and +1. Taking such a starting points we have a solution 
with a constant control u = ±1 and no switches. Finally we observe that at the 
points of non-differentiability the superdifferential is empty. 

4.2.3 Coupling the HJB and PMP approaches 

As before, the numerical solution of the HJB equation provides an approxima- 
tion of the final time tf and an initial costate p(0). This information is used 
here to start the shooting algorithm. Once again, the HJB initialization gives 
an immediate accurate convergence to the optimal solution, see Table H] and 
Fig. [7] In this example, the control discontinuities hinder the convergence by 
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*/ 


p(0) 


Initialization by HJB 


4.2 


(1.2,-4.2) 


Solution by PMP 


3.837 


(1.249,-3.787) 



Table 4: (P 2 ) - Initialization by HJB and solution by PMP. 



testing different integration schemes for the state and costate pair (y,p). Using 
a fixed step integrator (4th-order Runge-Kutta) without any precaution gives a 
very poor convergence with a norm of k 10 -3 for the shooting function. Using 
either a variable step integrator (Dopri [23]) or a switching detection method 
for the fixed step integrator [30] we get much better results (« 10 -11 for the 
shooting function norm). 



TRAJECTORY CONTROL STATE COSTATE 




Figure 7: (P 2 ) - Solution with one switch for the Van der Pol oscillator (shooting 
method). 



We now test two other starting points positioned very close to the curve 
where the value function is not diffcrcntiablc, namely x = (1.5,-0.67) and 
x = (1, —0.57). Computation of the gradient is performed as described in section 
13. ll in the case T is not differentiable, even if here that method is not in principle 
applicable due to the different nature of the non-differentiability We observe 
that the shooting method finds solutions with a switch immediately after the 
initial time or just before the final time. Here the initial guesses for the costate 
p(0) provided by the HJB method are not so close to the right ones, but they 
are sufficient to obtain convergence. Conversely, the minimum times given by 
HJB are rather close to the exact ones (Table [5]) . 





*/ 


p(0) 


Init. by HJB (x = (1.5, -0.67)) 


3.0 


(1.62,-0.87) 


Sol. by PMP [x = (1.5, -0.67)) 


2.9594 


(1.487,2.309 x lO" 3 ) 


Init. by HJB (x = (1,-0.57)) 


2.2 


(1.96,-0.10) 


Sol. by PMP (x = (1,-0.57)) 


2.1351 


(1.715,1.111 x 1Q-' A ) 



Table 5: (P 2 ) - Initializations by HJB and solutions by PMP, non-diffcrcntiable 
case. 
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4.3 Goddard problem 



The third example is the well-known Goddard problem (see for instance |311 
[331 [Ml [23 OS]), to illustrate the case of singular arcs. This problem models 
the ascent of a rocket through the atmosphere. We restrict ourselves to vertical 
(monodimensional) trajectories: the state variables are the altitude, velocity and 
mass of the rocket during the flight, so we have d = 3. The rocket is subject to 
gravity, thrust and drag forces. The final time is free, and the objective is to 
reach a certain altitude with a minimal fuel consumption. 



min f Q f bT max u 
r = v 

v = --pi + ^(T max u - D(r,v)) 
(P?) { rh= -bT max u 
U = [0, 1] 

r(0) = 1, v(0) = 0,m(0) = 1, 
I r(t f ) > 1.01 



with the parameters used for instance in 
D(r,v) = SlO^e" 500 ^- 1 ). 



2 T 



3.5 and drag 



4.3.1 PMP and shooting method 

As for (P2), the Hamiltonian is linear with respect to u, and we have a bang- 
bang control with possible switchings or singular arcs. The switching function 
is tp{y,p) = H u (y,p,u) = T max ((l -p m )b+ and the singular control can be 
obtained by formally solving ip — 0. The main difficulty, however, is to deter- 
mine the structure of the optimal control, namely the number and approximate 
location of singular arcs. The HJB approach is able to provide such informa- 
tion, in addition to the initial costate p(0) and optimal time tj. Assuming for 
instance one interior singular arc, the shooting function is defined by 

/i / , Pl (o),p 2 (o),p 3 (o)\ Mt f ) - im, P 2(t f ),p 3 (t f ), Pi (t f ) 

^3 ■ I Gentry 

h-» 1p(y(t entry), P(t entry)) 

V texit J V 1p(y(t entry), P(t entry)) 

4.3.2 Solving the problem with the HJB approach 

The Goddard problem is also hard to solve with the HJB approach, especially 
because the computation of the value function needs a huge number of iterations 
to converge and the solution is quite sensible to the choice of the numerical box 
f2 in which the value function is computed. In Fig. [H we show the optimal 
trajectory and the optimal control computed by HJB in £1 = [0.998, 1.012] x 
[—0.02,0.18] x [0.1,1.8]. As we can see, the HJB approach does not give a 
good approximation of the optimal control (vertical lines correspond to strong 
oscillations of the solution). 

4.3.3 Coupling the HJB and PMP approaches 

As for problems (Pi) and (P2), the HJB solution provides an estimate of the 
final time t*f and initial costate p(0). Moreover, an examination of the HJB 
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Figure 8: (P3) - Goddard problem, solution by HJB approach (first line: altitude 
and velocity. Second line: mass and control). 



solution gives a good idea of the structure of the optimal control and the optimal 
trajectory: the change of slope on the velocity clearly visible in Fig. [8] indicates 
an interior singular arc at (i entry >t exit) ~ (0.02,0.06). The same information 
can be deduced by the optimal control, strongly oscillating in the same time 
interval. Initializing the shooting method by means of these rough guesses, once 
again we obtain a quick convergence to the correct solution with the expected 
singular arc (Table [6] and Fig. [9|) . 





t* 


(gentry ? ^exit ) 


p(Q) 


Init. by HJB 


0.17 


(0.02,0.06) 


(-7.79,-0.31,0.04) 


Sol. by PMP 


0.1741 


(0.02351,0.06685) 


(-7.275,-0.2773,0.04382) 



Table 6: (P 3 ) - Initialization by HJB and solution by PMP. 



4.4 Minimum time target problem with a state constraint 

This fourth example aims to illustrate the case of a state constraint, as well as 
a four-dimensional problem for the HJB approach. The goal is to move a point 
on a plane, from an initial position to a target position, with a null initial and 
final velocity. The control is the direction of acceleration, and the objective is 
to minimize the final time. We add a state constraint which limits the velocity 
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CONTROL 

STATE COSTATE 




Figure 9: (P3) - Goddard problem, solution by PMP method. 



along the x-axis. 

min tf 

yi(t) = y 3 (t) 

fa{t) = y 4 (t) 
y 3 (t) = cos(u(t)) 
(P 4 ){ y A (t)=shx(u{t)) 
U = [0, 2tt) 

1/(0)=* =(-3, -4, 0,0) 
y(t f ) = (3,4,0,0) 
y 3 (t)<l te(0,t/) 

Let us write the state constraint as g(y(t)) < 0, with g defined by g(y) = j/3 — 1. 
The control appears explicitly in the first time derivative of g, so the constraint 
is of order 1, and we have: 

g(y(t)) = cos(u(t)), g y (y) = (0,0, 1,0). 

When the constraint is not active, minimizing the Hamiltonian gives the optimal 
control u* via 

(Pa, Pi) 



(cos(w*), sin('u*)) = — 



VpI + pI 



Over a constrained arc where g(y) — 0, the equation g{y, u) — and minimizing 
the Hamiltonian H leads to 

u* = -sign(p 4 )^. 

Then the condition H u = gives the value for the constraint multiplier /1 = —p 3 . 
At the entry point we have a jump condition for the costate: 



ith 7T 



entry 



P(ttntry) ~ Pi^entry) Gentry gy, 

£ I an additional shooting unknown. Compared to the uncon- 



strained problem, we have three more unknowns t 



entry ? texit 



and 7T ' en t r y . The cor- 



responding equations are the Hamiltonian continuity at t entrv and t e xit (which 
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boils down to P3 = 0), and the tangential entry condition g{y{t en try)) = 0. The 
shooting function is defined by 




0246 0246 
Vx Vy 



Figure 10: (P4) - Solution with a constrained arc by the HJB approach. 

In Fig. Uni we show the numerical solution obtained by using the HJB ap- 
proach in [—5, 5] 2 x [—2, 4] 2 . In addition to the optimal final time and the initial 
costate, the HJB solution gives an estimate of the bounds for the constrained 
arc where 2/3 = 1. The only shooting unknown for which we were not able 
to obtain relevant information is the multiplier n entry for the costate jump at 
t en try Therefore we used it entry = 0.1 as a starting guess, which turned out to 
be sufficient for the shooting method to converge properly (Table [7]). Fig. fTTI 
shows the corresponding solution, much cleaner than the HJB solution but with 
the same structure. We checked that the condition /i > was satisfied over the 
boundary arc as ps is negative, and p$ = at both entry and exit of the arc 
as requested by the Hamiltonian continuity conditions. The actual value of the 
multiplier for the jump on p% is Gentry — 4.1294. 





t* 


(gentry ) ^exit ) 


p(0) 


Init. by HJB 


7.5 


(1.35,5.6) 


(-0.51,-0.24,-0.89,-0.61) 


Sol. by PMP 


7.0356 


(1.137,5.899) 


(-0.867, -0.047, -0.986, -0.167) 



Table 7: (P 4 ) - Initialization by HJB and solution by PMP. 



4.5 Discretization parameters and CPU times 

In Table [8] we report the discretization parameters for HJB used in the four 
numerical tests. Note that the Van der Pol problem needs a rather fine grid to 
obtain a sufficient accuracy. In Table [5] we report the CPU times and the norm 
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5 2r 




Figure 11: (P4) - Solution with a constrained arc by PMP approach. 



of the shooting function at the end of the computations. The large time for the 
Goddard problem is due to the huge number of iteration needed by HJB. 





Problem 1 


Problem 2 


Problem 3 


Problem 4 


N x x ...xN d 


25 x 25 


200 x 200 


20 x 20 x 20 


20 x 20 x 20 x 20 


N c 


16 


2 


20 


16 



Table 8: Summary of discretization parameters for HJB 





Problem 1 


Problem 2 


Problem 3 


Problem 4 


HJB 


8 x 10-* 


2.98 


211 


182 


PMP 


3 x 10- 3 


7 x 10- y 


3 x 


2 x 10~ 2 


\s\ 


2.82 x 10~ lb 


8.14 x 10- 11 


1.12 x lO" 7 


6.68 x 10- 11 



Table 9: Summary of CPU times (seconds) and shooting function norm 



5 Conclusions and future work 

The known relation between the gradient of the value function in the HJB 
approach and the costate in the PMP approach allows to use the HJB results 
to initialize a shooting method. With this combined method, one can hope to 
benefit from the optimality of HJB and the high precision of PMP. 

We have tested the combined approach on four control problems presenting 
some specific difficulties. The numerical tests also included two cases where the 
value function is not differentiable. For these four problems, the HJB approach 
provides useful data such as an estimate of the initial costate p(0), the optimal 
final time t*j , and the structure of the optimal solution with respect to singular 
or constrained subarcs. In each case this information allowed us to successfully 
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initialize the shooting method. The fact that the optimal control computed by 
HJB was sometimes far from the exact control did not seem to be a main issue 
for the shooting method initialization. The total computational time for the 
combined HJB-PMP approach did not exceed four minutes, up to dimension 
four. This probably allows to run experiments in higher dimensions, like five or 
six. Moreover, the code for the HJB equation is easy parallelizable [37J. 

Even if the main limitation of the proposed method appears to be the state 
dimension (imposed by HJB), as we have seen this does not mean that only 
simple problems can be solved. We plan to apply this approach more specifically 
to trajectory optimization for space launchers: these problems are still hard 
despite having a low dimension, typically 3/4 for coplanar flight and 5/6 for 
a 3D flight. Experiments are in progress for the Ariane 5 launcher (mission 
toward the GTO orbit) and for a prototype of a reusable launcher with wings 
(toward the LEO orbit). 

Appendix 

Proof of Theorem \3.1\ Given the numerical domain Q, we define the set O' as 
f2':={iE6R d : v(x;h,n)< min v(x'; h, fl)}. 

The set fi is the box in which the approximate solution is actually computed 
and f2' represents the subset of il in which the solution is not affected by the 
fictitious boundary conditions we need to impose at dQ to make computation. 
From the front propagation point of view, <9f2' represents the front at the time 
it touches dfl for the very first time. 

Let us define v max := (1 — e~ Tma:c ) and fix a; £ f2'. We have 

T(x) < T max < +oo and v(x) < v max < 1. 
By (fT3"l) we have 

v(x; h) < v(x) + Ch a < v max + Ch a . 
Since v max < 1 there exists ho > such that 

v m ax + Ch a < 1 for all < h < ho 

then we can define 

Vmax ■ — Vmax ^ ^ 

and we have 

v(x) < v max < v ma x and v{x\ h) < Vmax for all x £ Q' , < h < h . 

For any fixed x G Q', it exists £ x £ [min{v(x),v(x;h)},max{v(x),v(x]h)}] such 
that 



\T{x)-T(x)\ = ln(l -«(»)) -ln(l-v(x;h)) 
Since ^ < v ma x, we have 





1 




1- 



\v(x) — v(x; h)\ 



Ch a 

\T(x) -T(x)\ < for all x e n' and < h < h 

1 ^in.ax. 
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and then it exists a positive constant C 2 which depends by the problem's data 
and on O such that 

II? - 2llz°°(fi') < C* 2 /i" for all < h < h . (14) 

We are now ready to recover an estimate on the gradient of the approximate 
solution T. By (fT4|) we know that, for any i = 1, . . . , d 



and 



T(x + z&i) = T(x + ze.j) + Ex with \Ex\ < C 2 h a 
f(x - ze{) = T(x - zei) + E 2 with \E 2 \ < C 2 h a . 



So we have 



n ~, \ T ( x + ggO + gi - (T(a; - ze a ) + £ 2 )) ~ E x - E 2 

D t T(x) = = DiT{x) 



so that 



and then 



\DiT(x) - DiT(x)\ < 



2z v ' 2z 

E\ — E 2 



2z 



h a 

<C 2 — 

z 



\\DT(x)~DT(x)\\ OQ <C 2 — . 

z 

We finally obtain, for x £ f2' and < h < ho, 

WDTixyDTix)]^ < \\DT (xyDT^Wov+WDTixyDTix)^ =o(— \+0(z 2 



z 

and the conclusion follows. □ 
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